When Two Is Better Than One: A Study of Ranking Paradigms and Their Integrations for Subtopic Retrieval

نویسندگان

  • Teerapong Leelanupab
  • Guido Zuccon
  • Joemon M. Jose
چکیده

In this paper, we consider the problem of document ranking in a non-traditional retrieval task, called subtopic retrieval. This task involves promoting relevant documents that cover many subtopics of a query at early ranks, providing thus diversity within the ranking. In the past years, several approaches have been proposed to diversify retrieval results. These approaches can be classified into two main paradigms, depending upon how the ranks of documents are revised for promoting diversity. In the first approach subtopic diversification is achieved implicitly, by choosing documents that are different from each other, while in the second approach this is done explicitly, by estimating the subtopics covered by documents. Within this context, we compare methods belonging to the two paradigms. Furthermore, we investigate possible strategies for integrating the two paradigms with the aim of formulating a new ranking method for subtopic retrieval. We conduct a number of experiments to empirically validate and contrast the state-of-the-art approaches as well as instantiations of our integration approach. The results show that the integration approach outperforms state-of-the-art strategies with respect to a number of measures.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Technical Report: A Study of Ranking Paradigms and Their Integrations for Subtopic Retrieval

In this paper, we consider the problem of document ranking in a non-traditional retrieval task, called subtopic retrieval. This task involves promoting relevant documents that cover many subtopics of a query at early ranks, providing thus diversity within the ranking. In the past years, several approaches have been proposed to diversify retrieval results. These approaches can be classified into...

متن کامل

Evaluating subtopic retrieval methods: Clustering versus diversification of search results

To address the inability of current ranking systems to support subtopic retrieval, two main post-processing techniques of search results have been investigated: clustering and diversification. In this paper we present a comparative study of their performance, using a set of complementary evaluation measures that can be applied to both partitions and ranked lists, and two specialized test collec...

متن کامل

Word Type Effects on L2 Word Retrieval and Learning: Homonym versus Synonym Vocabulary Instruction

The purpose of this study was twofold: (a) to assess the retention of two word types (synonyms and homonyms) in the short term memory, and (b) to investigate the effect of these word types on word learning by asking learners to learn their Persian meanings. A total of 73 Iranian language learners studying English translation participated in the study. For the first purpose, 36 freshmen from an ...

متن کامل

FRDC at the NTCIR-11 IMine Task

The FRDC team participated in the IMine task of the NTCIR11, including subtopic mining and document ranking subtasks for Chinese language. In the subtopic mining subtask, we propose two methods to build the two-level hierarchy subtopics. Our methods gain high F-score and H-score respectively. In the document ranking subtask, we adopt various features for relevant webpage retrieval and document ...

متن کامل

Udel @ NTCIR-11 IMine Track

This paper describes our participation in the Intent Mining track of NTCIR-11. We present our methods and results for both document ranking and subtopic mining. Our ranking methods are based on several data fusion techniques with some variations. Our subtopic mining method is a very simple technique that uses query dimensions’ items to form a subtopic

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010